# always clean up R environment
rm(list = ls())
# load all packages here
# Basic Data Analysis & Wrangling
library(tidyverse)
library(lubridate)
# Library for splitting the chinese words.
library(jiebaRD)
library(jiebaR)
# Library for generating word cloud.
library(wordcloud)
# Library for visualizing the 3d plot.
library(plotly)
I - Introduction
Based on definition from Investopedia.com, Social good means “something that benefits the largest number of people in the largest possible way, such as clean air, clean water, healthcare and literacy.” Social good is also referred to as the “common good.” In our topic, we hope to discuss a heated issue within the community of developers. Before we go in depth of the issue, here are a couple things indicating why it’s an important issue:
Overall Chinese Internet environment:
A couple companies have tried to enter the lucrative market of mainland China, however, many of them faced obstacles. A famous example is from Google, who specifically designed the dragonfly plan for Chinese Market, that failed due to regulation as well as lack of interest from the general public. eBay was one of the earliest counterparts that laid eyes on Asia market, and all of these big companies failed for the same reason: they all had unsuccessful integration of incompatible cultures. We personally call it “attempt of cultural imperialism within the field of internet.” The big companies took over many places with success without changing much of its business model, or approach of the local audience. However, there is a tremendous culture difference between western and eastern society. One example is the young generate in the United States use Facebook, Twitter, Instagram, Snapchat and other social platforms simultaneously. Teens in China all only use one social media app, Wechat, that could also pay for bills, call an uber, book a movie, find the restaurant. Consumer behavior and culture difference is what pulled these big companies back.
Chinese government is known for its regulation and censorship. According to an article published by New York Times, the new president hopes to use the Internet to strengthen Communist Party’s role on the society. Majority of the young generation is indifferent to politics, although many are victims of censorship as well as censorship factory workers.
From a brief introduction of how the overall Chinese Internet environment is different from the United States, and other developed countries, we will now connect to the topic of interest today: 996.ICU event.
996.ICU is a reference to the grueling and illegal working hours of many tech companies in China - from 9am to 9pm, 6 days a week. The name “996.ICU” came from the description in the repository, “By following the ‘996’ work schedule, you are risking yourself getting into the ICU (Intensive Care Unit).” The event came to the peak when Jack Ma, the founder of the e-commerce giant Alibaba Group, gave the following remarks in mid-April 2019: “It is a huge blessing that we can work 996.” Alibaba owns the Amazon of China, as well as the biggest cloud computing platform in mainland. He said, “If you do not do 996 when you are young, when will you do it. If you don’t put more time and energy than others, how can you achieve the success you want?” Such remark has received controversial comments inside and outside of the country. Currently, 996.ICU repository is ranked No.2 on the Trending page for github, world’s largest developer community, right after the repository that hosts all Algorithms implemented in Python. Microsoft and GitHub Workers started their own repository to support 996.ICU movement.
This movement is personally significant to our team, because both of us have interacted with the companies mentioned above and have friends and families who work in Tech in China. We have witnessed the consequences caused long hours, and unproductive work in the Tech industry in China. On one hand, the Chinese overall Internet environment is different from the United States, as well as it is at least 10 years behind the U.S.. However, on the other hand, pressuring workers to work long hours would not sufficiently bridge the gap, nor would it be beneficial to technological improvement.
For the rest of our project, we used text analysis, and supervised and unsupervised techniques to dive deep into the problem.
II - Data Preprocessing
Load datasets
dt_issues = read.csv("data/issues_data.csv", header=TRUE)
dt_star = read.csv("data/stargazers.csv", header=TRUE)
dt_user = read.csv("data/users_data.csv", header=TRUE)
Saniety Check
# Inspect the dataset by taking the first 10 rows of each dataset.
dt_issues %>% head(10)
dt_star %>% head(10)
dt_user %>% head(10)